Goto

Collaborating Authors

 cooperative learning


Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization

Neural Information Processing Systems

There is a natural correlation between the visual and auditive elements of a video. In this work we leverage this connection to learn general and effective models for both audio and video analysis from self-supervised temporal synchronization. We demonstrate that a calibrated curriculum learning scheme, a careful choice of negative examples, and the use of a contrastive loss are critical ingredients to obtain powerful multi-sensory representations from models optimized to discern temporal synchronization of audio-video pairs. Without further fine-tuning, the resulting audio features achieve performance superior or comparable to the state-of-the-art on established audio classification benchmarks (DCASE2014 and ESC-50). At the same time, our visual subnet provides a very effective initialization to improve the accuracy of video-based action recognition models: compared to learning from scratch, our self-supervised pretraining yields a remarkable gain of +19.9% in action recognition accuracy on UCF101 and a boost of +17.7% on HMDB51.


Content Caching-Assisted Vehicular Edge Computing Using Multi-Agent Graph Attention Reinforcement Learning

Shen, Jinjin, Lin, Yan, Zhang, Yijin, Zhang, Weibin, Shu, Feng, Li, Jun

arXiv.org Artificial Intelligence

In order to avoid repeated task offloading and realize the reuse of popular task computing results, we construct a novel content caching-assisted vehicular edge computing (VEC) framework. In the face of irregular network topology and unknown environmental dynamics, we further propose a multi-agent graph attention reinforcement learning (MGARL) based edge caching scheme, which utilizes the graph attention convolution kernel to integrate the neighboring nodes' features of each agent and further enhance the cooperation among agents. Our simulation results show that our proposed scheme is capable of improving the utilization of caching resources while reducing the long-term task computing latency compared to the baselines.


Reviews: Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization

Neural Information Processing Systems

The authors propose self-supervised learning of audio and video features, by means of a curriculum learning setup. In particular, a deep neural network is trained with a contrastive loss function; generating as large feature distances as possible when the video and audio fragment are out of sync, while generating as small distances as possible when they are in sync. The proposed self-supervised learning scheme gives good results for downstream processing, e.g., improving over pure supervised training from scratch. In general, I think this work is highly interesting, since it may significantly reduce the need for manually labelled data sets, and hence may be of significant impact on video & audio processing algorithms. I also think the work is original, although this is not very easy to determine exactly.


Cooperative Learning with Gaussian Processes for Euler-Lagrange Systems Tracking Control under Switching Topologies

Yang, Zewen, Dong, Songbo, Lederer, Armin, Dai, Xiaobing, Chen, Siyu, Sosnowski, Stefan, Hattab, Georges, Hirche, Sandra

arXiv.org Artificial Intelligence

This work presents an innovative learning-based approach to tackle the tracking control problem of Euler-Lagrange multi-agent systems with partially unknown dynamics operating under switching communication topologies. The approach leverages a correlation-aware cooperative algorithm framework built upon Gaussian process regression, which adeptly captures inter-agent correlations for uncertainty predictions. A standout feature is its exceptional efficiency in deriving the aggregation weights achieved by circumventing the computationally intensive posterior variance calculations. Through Lyapunov stability analysis, the distributed control law ensures bounded tracking errors with high probability. Simulation experiments validate the protocol's efficacy in effectively managing complex scenarios, establishing it as a promising solution for robust tracking control in multi-agent systems characterized by uncertain dynamics and dynamic communication structures.


Progressive Energy-Based Cooperative Learning for Multi-Domain Image-to-Image Translation

Song, Weinan, Zhu, Yaxuan, He, Lei, Wu, Yingnian, Xie, Jianwen

arXiv.org Artificial Intelligence

This paper studies a novel energy-based cooperative learning framework for multi-domain image-to-image translation. The framework consists of four components: descriptor, translator, style encoder, and style generator. The descriptor is a multi-head energy-based model that represents a multi-domain image distribution. The components of translator, style encoder, and style generator constitute a diversified image generator. Specifically, given an input image from a source domain, the translator turns it into a stylised output image of the target domain according to a style code, which can be inferred by the style encoder from a reference image or produced by the style generator from a random noise. Since the style generator is represented as an domain-specific distribution of style codes, the translator can provide a one-to-many transformation (i.e., diversified generation) between source domain and target domain. To train our framework, we propose a likelihood-based multi-domain cooperative learning algorithm to jointly train the multi-domain descriptor and the diversified image generator (including translator, style encoder, and style generator modules) via multi-domain MCMC teaching, in which the descriptor guides the diversified image generator to shift its probability density toward the data distribution, while the diversified image generator uses its randomly translated images to initialize the descriptor's Langevin dynamics process for efficient sampling.


Group-Agent Reinforcement Learning

Wu, Kaiyue, Zeng, Xiao-Jun

arXiv.org Artificial Intelligence

It can largely benefit the reinforcement learning (RL) process of each agent if multiple geographically distributed agents perform their separate RL tasks cooperatively. Different from multi-agent reinforcement learning (MARL) where multiple agents are in a common environment and should learn to cooperate or compete with each other, in this case each agent has its separate environment and only communicates with others to share knowledge without any cooperative or competitive behaviour as a learning outcome. In fact, this scenario exists widely in real life whose concept can be utilised in many applications, but is not well understood yet and not well formulated. As the first effort, we propose group-agent system for RL as a formulation of this scenario and the third type of RL system with respect to single-agent and multi-agent systems. We then propose a distributed RL framework called DDAL (Decentralised Distributed Asynchronous Learning) designed for group-agent reinforcement learning (GARL). We show through experiments that DDAL achieved desirable performance with very stable training and has good scalability.


Cooperative learning for multi-view analysis

Ding, Daisy Yi, Narasimhan, Balasubramanian, Tibshirani, Robert

arXiv.org Machine Learning

With new technologies in biomedicine, we are able to generate and collect data of various modalities, including genomics, epigenomics, transcriptomics, and proteomics (Figure 1A). Integrating heterogeneous features on a single set of observations provides a unique opportunity to gain a comprehensive understanding of an outcome of interest. It offers the potential for making discoveries that are hidden in data analyses of a single modality and achieving more accurate predictions of the outcome (Kristensen et al. 2014, Ritchie et al. 2015, Gligorijević et al. 2016, Karczewski & Snyder 2018, Ma et al. 2020). While "multi-view data analysis" can mean different things, we use it here in the context of supervised learning, where the goal is to fuse different data views to model an outcome of interest. To give a concrete example, assume that a researcher wants to predict cancer outcomes from RNA expression and DNA methylation measurements for a set of patients. The researcher suspects that: (1) both data views could potentially have prognostic value; (2) the two views share some underlying relationship with each other, as DNA methylation regulates gene expression and can repress the expression of tumor suppressor genes or promote the expression of oncogenes. Should the researcher use both data views for downstream prediction, or just use one view or the other?


Distributed Learning on Heterogeneous Resource-Constrained Devices

Rapp, Martin, Khalili, Ramin, Henkel, Jörg

arXiv.org Machine Learning

We consider a distributed system, consisting of a heterogeneous set of devices, ranging from low-end to high-end. These devices have different profiles, e.g., different energy budgets, or different hardware specifications, determining their capabilities on performing certain learning tasks. We propose the first approach that enables distributed learning in such a heterogeneous system. Applying our approach, each device employs a neural network (NN) with a topology that fits its capabilities; however, part of these NNs share the same topology, so that their parameters can be jointly learned. This differs from current approaches, such as federated learning, which require all devices to employ the same NN, enforcing a trade-off between achievable accuracy and computational overhead of training. We evaluate heterogeneous distributed learning for reinforcement learning (RL) and observe that it greatly improves the achievable reward on more powerful devices, compared to current approaches, while still maintaining a high reward on the weaker devices. We also explore supervised learning, observing similar gains.


Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization

Korbar, Bruno, Tran, Du, Torresani, Lorenzo

Neural Information Processing Systems

There is a natural correlation between the visual and auditive elements of a video. In this work we leverage this connection to learn general and effective models for both audio and video analysis from self-supervised temporal synchronization. We demonstrate that a calibrated curriculum learning scheme, a careful choice of negative examples, and the use of a contrastive loss are critical ingredients to obtain powerful multi-sensory representations from models optimized to discern temporal synchronization of audio-video pairs. Without further fine-tuning, the resulting audio features achieve performance superior or comparable to the state-of-the-art on established audio classification benchmarks (DCASE2014 and ESC-50). At the same time, our visual subnet provides a very effective initialization to improve the accuracy of video-based action recognition models: compared to learning from scratch, our self-supervised pretraining yields a remarkable gain of 19.9% in action recognition accuracy on UCF101 and a boost of 17.7% on HMDB51.


A Dataset Schema for Cooperative Learning from Demonstration in Multi-robots Systems

Simões, Marco A. C., da Silva, Robson Marinho, Nogueira, Tatiane

arXiv.org Artificial Intelligence

To achieve these common goals, agents in a MAS should be capable of interacting with other agents, not simply by exchanging data, but by engaging as in social activities, such as those people participate in their daily lives: cooperation, coordination, negotiation, and the like. In MASs, agents are assumed to be autonomous - capable of making independent decisions about to do in order to satisfy their design objectives, and thus they need mechanisms that allow them to synchronize and to coordinate their activities at run time [31]. Although one of the main issues in MASs is the agents' coordination structure, this is not hard-wired at design time, as MASs are typically in standard concurrent/distributed systems. One well-known strategy for coordination in MAS is the design of multi-agent coordinated plans [7][35][36][33][14] that include, not only usual agents' actions defined by their effectors, but also communication actions to achieve the necessary synchronization and coordination. To represent communication actions, some specific languages were created, e.g.